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Abstract: The mathematical properties of a family of generalized beta dis- 
tribution, including beta-normal, skewed-t, log-F, beta-exponential, beta- 
WeibuU distributions have recently been studied in several publications. 
This paper applies these distributions to the modeling of the size distribu- 
tion of income and computes the maximum likelihood estimation estimates 
of parameters. Their performances are compared to the widely used gener- 
alized beta distributions of the first and second types in terms of measures 
of goodness of fit. 
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1. Introduction 

Consider the distribution function of a beta random variable given by 

G(y) = [i?(a, /?)]-! r e~\l^tf-^dt, (1.1) 
Jo 



for < y < 1, where a > 0,/3 > and the beta function B{a,(3) = r(a + 
(3)1 [r(a)r(/3)]. Note the domain of G(-) is (0, 1). Use the fact that the range 
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of a cumulative distribution function (cdf) is (0, 1), replacing the upper limit 
y of the integration in (1.1) with a cdf F{-) has been studied in several papers 
reviewed below. The resulting probability density function is 

gpix) = (3)Y' fix) [FixT'' [1 - Fix)f-' , (1.2) 

where /(•) is the derivative of i^(-)and therefore is the corresponding probability 
density function if F(-) is a distribution function. For simplicity, this distribution 
will be called the generalized beta-f distribution hereafter. 

Jones (2004) introduced this as the probability density function of the trans- 
formed random variable X = F~^{Y) where Yis a Beta random variable with 
parameters of a and /?. The density function form in (1.2) was also alternatively 
described as a simple generalization of the use of the collection of order statistics 
distributions associated with F. Jones (2004) and Ferrcira. etc. (2004) explored 
general properties of this family of distributions and examined the special cases 
of skewed-t and log-F distributions. Since F(-) can be any distribution function, 
the family of this generalized bcta-F distribution is a very rich one and can be 
further explored. This family of distribution was first introduced by Singh et 
al. (1988) and has since been studied for several distribution functions. In this 
paper, it is applied to the analyses of placecountry-regionU.S. family income 
data. 

Numerous distributions (see McDonald, 1984, and references therein), includ- 
ing gamma, beta, Singh-Maddala (or Burr), Pareto. Weibull, and generalized 
beta of first and second kinds, have been used to model the size distribution of 
income. McDonald (1984) fit the above models to the income data of 1970, 1975, 
and 1980 and concluded that the generalized beta of the second type provided 
the best relative fit and that the Singh-Maddala (SM) distribution provided a 
better fit than the generalized beta of the first kind. McDonald also discussed 
the relationships between several widely used models for the income distribu- 
tion, those relationships can be expanded to the family of the generalized bcta-F 
distribution in (1.2) that includes some of the distributions as its special cases. 

In this paper, examples of the family of the generalized beta-F distribution 
described in (1.2) in existing literature are summarized in section 2. The distri- 
butions tabulated in Table 1 are fit to the U.S. family income data presented in 
a grouped format on the website of the Census Bureau. Outlines of the maxi- 
mum likelihood estimation of unknown population parameters involved in the 
generalized beta-f distribution function and in the F(-)function are derived for 
the grouped income data in section 3. The equations to be maximized and the 
gradients do not have closed forms and depend on the function F{-) of interest. 
Besides the parameter estimates and associated estimated value for the mean, 
goodness-of-fit values including sum of the squared errors, sum of the abso- 
lute deviations and chi-squares are reported for comparisons in section 4. The 
performance comparisons of the distributions are also presented. 
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2. The Models 

The probability functions of interest and their means and moments are summa- 
rized in Table 1 in this section. Technical details on the characteristics such as 
the shapes, moments, skewness, and limiting distribution as some parameters 
tend to extreme values of each distribution can also be found in the provided 
reference. 

The generalized beta of the first (GBl) and second (GB2) kind (McDonald, 
1984) are respectively defined by 

^^y' ~ 6-«B(a,/3)[l + (y/6)«]"+/3' " " ^^'^^ 

They are special cases of the generalized beta-F distribution with F{x) — {x/hY 
and F[x) = 1 — [1 + (a;/6)°]~^ for x > in (1.2), respectively. The underlying 
distribution of income in Thurow (1970) with a=l in (2.1) is therefore also 
a special case with a distribution function F of a uniform distribution over 
the interval (0,6). The Singh-Maddala distribution with a density function of 
apy^a— l)/[l + {y/b)°']^^^ is a special case of generalized beta of the second kind 
with the beta parameter a = 1. Note that these distributions are unimodal. 

Eugene (2002)studies the properties of a beta-normal (BN) distribution, i.e., 
F is a normal distribution function. Gupta and Nadarajah (2004) further derived 
a different form of the moments of the beta-normal distribution. The beta- 
normal can be both bimodal and unimodal. Eugene (2002) showed that it is 
skewed to the right when a > (3 and skewed-to the left when a < /?. When 
a = (3, it is symmetric about/z. It has heavy symmetric tails when a < land 
/3 < 1 in which bimodality eventually occurs as a(=/3) decreases. Also when 
a > 1 and/3 > 1, it has long symmetric tails with a higher peak associated with 
a larger value of a (=/?). 

The two particular distributions that Jones (2004) believed to provide the 
most tractable instances of families with power and exponential tails are the 
skew-t distribution (Beta-T") and the log-F (Beta-Logistic) distribution. The 
skew-t distribution (Jones, 2001, and Jones and Faddy, 2003) can be derived with 
F{t) = [1 +t{\^a + b + t^)^^]/2, which is the distribution function of the scaled 
student t distribution on 2 degrees of freedom, with scaling factor (a + b) /2. 
The skewed-t reduces to the symmetric Student's t distribution when a=b and 
becomes skewed when a^b. It is unimodal and heavy tailed, and the skewness 
measured based on the third moment is a monotone increasing function of a for 
fixed b and a monotone decreasing function of b for fixed a. 

The log-F distribution is a special case of family (1.2) with the standard 
logistic distribution F(x)=e^ /{I + e^) which can also be other types of gener- 
alized logistic distributions. Brown et al. (2002) presented examples of applica- 
tion areas including survival analyses in which log-normal, WeibuU, log-logistic, 
and generalized gamma was shown to be special cases of the log-F model; see 
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Kalbfleish and Prentice (1980). The log-F is unimodal and can be symmetric, or 
skewed-to the left or to the right. A generalized four-parameter version of log-F 
with location parameter a, scale parameter 6, shape parameters aand replacing 
a; by (x — a)/h in the distribution function F(x) will be fit to the income data 
in this paper. 

Nadarajan (2004)derived the moment generating function, skewness, kurtosis 
and other properties for the beta-exponential (BE) distribution with an expo- 
nential F. Both measures of skewness and kurtosis are shown to decrease mono- 
tonically with the parameters a and /3. Famoye, etc. (2005) studied the beta- 
WeibuU distribution with F(x)=\-eyiY>{-ax^). The beta-WeibuU is unimodal, and 
the mode is at the point of when h < 1. That is, beta-Weibull distribution 
(BW) has a inversed-J shape when h < 1. Note that the exponential distribution 
is a special case of Weibull distribution. Nadarajan and Ktoz (2004) investigated 
the unimodal beta-Gumbel distribution in the hope of attracting wider applica- 
bility in engineering due the wide applications of the Gumbel distribution in the 
field and showed that it has a single mode and an increasing hazard function. 

The following Table 1 lists moments and means for various heta-F distribu- 
tions to be fit to the size distribution of income. The means in the table will be 
calculated as a check of the validity of the parameters produced from computer 
algorithms in the next section. Let <^{x] fi, a) be the distribution function of a 
normal random variable with mean fi and standard deviation a and digamma 
\E'(;r) = (ilogr(a;)/(ia;be the Euler's psi function; see Gradshteyn and Ryzhik 
(2000) Define /„,fc = /^^ x"fix){l - Ffdx 



Table 1. Distributions and Their Moments 



Model 



F(x) in (1.2) 



Moments E{X") 



b" B(a+P,n/a) 
B (a .n / a) 



Mean 



bB{a + l3,l/a) 
B(a.l/a) 



GBl 



{x/br 



b" B{ a + n/a,l3-nla) 



bB{a + l/a,fj-l/ay 



GB2 



l + (x/b)'^ 



Beta-Normal 
(BN) 



IM, a) 



a+j — l 

k=0 



13-1 a+j-2 



(-1) 



j + k 



B(a.0) / A / . k + \ \3 
j=0 fc = 



/3-l\ (a+3-l\ 



Ofc + 1 



Skew-t 



1 ( 1 + ^ 

^ V •/a + b+x^ 

a = a,b = (3 



2'>S(a,6) / ^ 

for a > n/2, fe > n/2 



{a~b)y'a + b) r(a- l/2)r(6- 1/2) 
2 r(a)r{6) 



Log-F 



1 + 



' a+j-l 



/3-1 

i=o 

a+j — 1 

fc— 
1'(a + /3)-^I?7 



Beta- 
exponential 
(BE) 



l-cxp(-a3::) 



Beta-WeibuU 
(BW) 



l-cxp(-ax'') 



r(Q+/3)r(n/i>-n) y^ (-i)'°(/3-i-fc)-<"+'''/'' 

r(/3)a"/'' 



fc!r(a-fe) 



r(a-i-;3)r(i/i>-i-i) (-i)*^(/3+fc)-'i+'')/'' 

fc=0 
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3. Maximum Likelihood Estimation 

The income data were in a grouped format with only tlie frequency and mean 
income of each group given. Let G and g be the respective cumulative distri- 
bution and probability density function of a beta random variable yas in (1.1). 
Let 6*0 = (a, /S)"^ and 0F=(a,^)"^be the column vectors of parameters associated 
with the beta distribution G and the distribution function F in (1.1) and (1.2), 
respectively. Define the probability 

r rF(xi) 

n{eG,eF)^ / gFix;eG,eF)dx^[Bia,p)]-' / e-\i-ty-^dt. (3.1) 

Jli J F(xi-i) 

It is the proportion of the population in the ith of the r income groups defined 
by the interval/i= [xi^i^Xi). The likelihood hmction for the data is therefore 
given by 



i=i n, 



where n^, i = 1, • • • , r, is the frequency of the ith group and N — X]I=i 
The maximum log-likelihood estimators are obtained by maximizing 

r 

L(0G,eF) -^n, In P.(0G,M- (3.2) 

It is well known that the resulting estimators by maximizing the multino- 
mial likelihood function in (3.2) is less efficient than the ones based on individ- 
ual observation, it is asymptotically efficient. Note that the group probability 
Pi{9Gi 0f) in (3.1) can be obtained by ffi'st evaluating the cdf of a beta random 
variable at F{xi^i; Op) and F{xi] 0i?)and then computing the difference between 
the two values. This reduces the complexity of programming required to calcu- 
late the integrations, because algorithms for evaluations of cdf are available 
readily in most statistical software. 

Next, the first derivative L{9G,dp) will be presented. Let Q ~{dG,dF)T. The 
first derivative of L(0)with respect to 0are 

dL{Q) _ ^ dP,{Q) 

de ~^p,{Q) de ^^"^^ 

Note that the parameter vector^F are the parameters involved in the function 
F. The derivatives of Pi{OG, Op)m (3.1) with respect to 9g and 9f are given by 

dP,{dG,0F) ^ /■^(^'^"-^ dg{t;eG,0F) ^^, 

ddc Jf(x,-i;Bf) d0G 



dg{t-eg,eF) 

da 



d\ogBia,P) ^ 
da 
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dgjt] 9g 
dp 



git) 



d\ogB{a,(3) 



dp 



+ ln(l - t) 



dP,{ 



d0f 



g[Fixf,0F);eG] 



dF{x,-eF) 
d9f 



-g[Fix^-i;eF);eG,eF] 



d0f 



(3.4) 

and dlog B{a, P)/da='^{a) — '^{a + P). The nonlinear optimization subroutines 
in SAS can be employed by specifying the equation in (3.2)to be maximized and 
the gradient function in (3.3). Both the likelihood function to be maximized and 
the gradient function vary with the distribution functions-F(-) under considera- 
tion, and the resulting functional forms of (3.4 ) can be tedious and therefore 
not presented for any F(-)under consideration here. 



4. Estimation and Comparisons 

The nonlinear Newton-Raphson method in SAS was employed with the specifi- 
cation of the function to be minimized and the corresponding gradient function. 
The income data were in a group format and can be found on the Census Bu- 
reau's web site. The first group consists of families making less than $25,000, 
and the last group of more than $250,000. In the evaluation of (3.1) and (3.2), 
the value of the cdf F{-)is set to be at the lower boundary of the first class 
and 1 at the upper boundary of the last class in our SAS programs. 

The results for years 2003, 2004 and 2005 arc reported in Tables 2, 3 and 
4. The mean (fii) and frequency (jn) for each group are reported on the Cen- 
sus Bureau's web site and the approximated mean income for each year can 
be calculated by "-iM'/ "-i- T^^^ approximated sample mean incomes (in 
$10,000) for 2003, 2004 and 2005 are 6.598, 6.140 and 7.040, respectively. Note 
that year 2004 has the lowest means among the three years. The estimated 
means in the following tables arc calculated using the estimated parameter val- 
ues in the mean expressions given Tabic 1. The resulting estimated means using 
the skewed-t appear to overestimate. The sum of squared errors (SSE) between 
the relative frequency Ui/Nand the estimated frequency Pi{Q) or the absolute 
errors (SAE), and chi-square a-re also reported. 

The generalized four-parameter log-F distribution appears to yield the best 
fit in terms of chi-squares and SAE, and the generalized beta of the second type 
(GB2) in terms of SSE. Overall, the log-F performs well which is consistent 
with Jones' belief that Log-F provides the most tractable instances of families 
with power and exponential tails. The two-parameter skew-t performs relatively 
poor in the results. As in McDonald (1984), the generalized beta of the sec- 
ond type provides better fit than the generalized beta of the first type (GBl). 
Trailing behind the log-F and GB2 is the bcta-WeibuU. The thrcc-paramctcr 
beta-exponential and beta-WcibuU provide better fit than the GBl in terms of 
all measures of goodness fit. Thought the skew-t has second worst performance, 
it appears to perform much better than beta-normal. The beta-normal distri- 
bution noticeably performs the worst. Note that the normal distribution itself 
is a poor fit for skewed data. 
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Table 2 
Estimated Results for 2003 





— GBl 


GB2 


BN 


skcw-t 


Log-F 


BE 


BW 


a 


1.955 


0.490 


2.348 


7.822 


3.468 


1.700 


1.748 


P 


2830.689 


1.111 


0.369 


0.936 


0.195 


0.799 


0.875 


a 


0.889 


2.724 


0.000 




2.256 


0.257 


0.251 


b 


22685.000 


8.297 


4.012 




2.294 




0.982 


Est. Means 


6.517 


6.618 


6.642 


7.474 


6.017 


6.498 


6.485 


1000 *SSE 


0.929 


0.504 


9.390 


2.225 


0.562 


0.886 


0.877 


SAE 


0.161 


0.123 


0.379 


0.224 


0.125 


0.158 


0.158 




3816.534 


1972.524 


65394.750 


6103.421 


1926.019 


3698.846 


3707.640 



Table 3 
Estimated Results for 2004 





GBl 


GB2 


BN 


skcw-t 


Log-F 


BE 


BW 


a 


1.680 


0.485 


2.328 


7.649 


3.226 


1.608 


1.552 


P 


6558.823 


1.179 


0.342 


0.894 


0.187 


0.948 


0.829 


a 


0.951 


2.647 


0.000 




2.043 


0.209 


0.219 


b 


39366.000 


8.951 


3.942 




2.039 




1.024 


Est. Means 


6.679 


6.784 


6.818 


7.883 


6.075 


6.681 


6.684 


1000* SSE 


0.841 


0.477 


8.697 


2.727 


0.420 


0.789 


0.799 


SAE 


0.151 


0.122 


0.366 


0.232 


0.106 


0.148 


0.148 


x' 


3501.860 


1922.756 


55209.277 


6851.514 


1451.627 


3449.960 


3428.916 



Table 4 
Estimated Results for 2005 





GBl 


GB2 


BN 


skew-t 


Log-F 


BE 


BW 


a 


1.672 


0.449 


2.311 


7.734 


3.254 


1.609 


1.547 




3945.308 


1.041 


0.330 


0.885 


0.170 


0.950 


0.822 


a 


0.954 


2.838 


0.000 




0.003 


0.205 


0.215 


b 


23068.000 


8.773 


3.937 




0.942 




1.027 


Est. Means 


6.811 


7.145 


6.931 


8.116 


6.256 


6.801 


6.801 


1000*SSE 


0.859 


0.429 


9.002e 


2.789 


0.446 


0.799 


0.812 


SAE 


0.151 


0.114 


0.364 


0.237 


0.105 


0.147 


0.148 


x' 


3529.821 


1764.919 


52066.738 


7096.854 


1525.312 


3475.881 


3452.047 
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Next, in order to have a better picture on how the tail for each distribution 
fitted to the data, the estimated density functions based on the 2005 income 
data are presented in the foUowing graph. The skewed-t appears to result a 
thicker tail than others. 

In summary, the log-F provides the best relative fit and then followed by the 
generalized beta of the second type. Among other distributions in the family 
of the generalized beta distribution that were fit to the data, the beta-normal 
appears to perform poorly. The two-parameter skew-t distribution can proba- 
bly extended to four-parameter one whose mathematical properties including 
moments and shapes needs further studied. 
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probability 
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